Search CORE

119 research outputs found

Automatically extracting functionally equivalent proteins from SwissProt

Author: A Amores
A Meyer
A Wagner
AA Akindahunsi
Andrew CR Martin
CH Wu
E Kretschmann
EJ Stellwag
EV Koonin
F Chen
GX Yu
II Artamonova
JM Hurst
KP O'Brien
LB Koski
Lisa EM McMillan
MC Lill
MY Galperin
RA Notebaart
RL Tatusov
RL Tatusov
S Shibata
SB Rice
SF Altschul
T Hulsen
T Hulsen
V Kunin
V van Noort
WM Fitch
Y Lee
Y Yaron
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

In summary, FOSTA provides an automated analysis of annotations in UniProtKB/Swiss-Prot to enable groups of proteins already annotated as functionally equivalent, to be extracted. Our results demonstrate that the vast majority of UniProtKB/Swiss-Prot functional annotations are of high quality, and that FOSTA can interpret annotations successfully. Where FOSTA is not successful, we are able to highlight inconsistencies in UniProtKB/Swiss-Prot annotation. Most of these would have presented equal difficulties for manual interpretation of annotations. We discuss limitations and possible future extensions to FOSTA, and recommend changes to the UniProtKB/Swiss-Prot format, which would facilitate text-mining of UniProtKB/Swiss-Prot

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

UCL Discovery

PubMed Central

Enlighten

annot8r: GO, EC and KEGG annotation of EST datasets

Author: A Bairoch
A Conesa
A Papanicolaou
DM Martin
E Camon
EM Zdobnov
J Bai
J Parkinson
J Parkinson
JD Wasmuth
JE Stajich
LB Koski
M Ashburner
M Kanehisa
Mark L Blaxter
MS Boguski
Ralf Schmid
SF Altschul
SR Stürzenbaum
The UniProt Consortium
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background The expressed sequence tag (EST) methodology is an attractive option for the generation of sequence data for species for which no completely sequenced genome is available. The annotation and comparative analysis of such datasets poses a formidable challenge for research groups that do not have the bioinformatics infrastructure of major genome sequencing centres. Therefore, there is a need for user-friendly tools to facilitate the annotation of non-model species EST datasets with well-defined ontologies that enable meaningful cross-species comparisons. To address this, we have developed annot8r, a platform for the rapid annotation of EST datasets with GO-terms, EC-numbers and KEGG-pathways. Results annot8r automatically downloads all files relevant for the annotation process and generates a reference database that stores UniProt entries, their associated Gene Ontology (GO), Enzyme Commission (EC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) annotation and additional relevant data. For each of GO, EC and KEGG, annot8r extracts a specific sequence subset from the UniProt dataset based on the information stored in the reference database. These three subsets are then formatted for BLAST searches. The user provides the protein or nucleotide sequences to be annotated and annot8r runs BLAST searches against these three subsets. The BLAST results are parsed and the corresponding annotations retrieved from the reference database. The annotations are saved both as flat files and also in a relational postgreSQL results database to facilitate more advanced searches within the results. annot8r is integrated with the PartiGene suite of EST analysis tools. Conclusion annot8r is a tool that assigns GO, EC and KEGG annotations for data sets resulting from EST sequencing projects both rapidly and efficiently. The benefits of an underlying relational database, flexibility and the ease of use of the program make it ideally suited for non-model species EST-sequencing projects.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Edinburgh Research Explorer

Leicester Research Archive

Phylometrics: a pipeline for inferring phylogenetic trees from a sequence relationship network perspective

Author: Cleber C Ouverney
D Wu
J Felsenstein
JA Eisen
JE Stajich
JF Imhoff
K Tamura
L Fokkens
LB Koski
M Larkin
M Suderman
MS Rappé
NR Pace
P Hugenholtz
R Pethica
S Altschul
S Guindon
Samuel A Smits
TZ DeSantis
W Ludwig
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Comparative sequence analysis of the 16S rRNA gene is frequently used to characterize the microbial diversity of environmental samples. However, sequence similarities do not always imply functional or evolutionary relatedness due to many factors, including unequal rates of change and convergence. Thus, relying on top BLASTN hits for phylogenetic studies may misrepresent the diversity of these constituents. Furthermore, attempts to circumvent this issue by including a large number of BLASTN hits per sequence in one tree to explore their relatedness presents other problems. For instance, the multiple sequence alignment will be poor and computationally costly if not relying on manual alignment, and it may be difficult to derive meaningful relationships from the resulting tree. Analyzing sequence relationship networks within collective BLASTN results, however, reveal sequences that are closely related despite low rank. Results We have developed a web application, Phylometrics, that relies on networks of collective BLASTN results (rather than single BLASTN hits) to facilitate the process of building phylogenetic trees in an automated, high-throughput fashion while offering novel tools to find sequences that are of significant phylogenetic interest with minimal human involvement. The application, which can be installed locally in a laboratory or hosted remotely, utilizes a simple wizard-style format to guide the user through the pipeline without necessitating a background in programming. Furthermore, Phylometrics implements an independent job queuing system that enables users to continue to use the system while jobs are run with little or no degradation in performance. Conclusions Phylometrics provides a novel data mining method to screen supplied DNA sequences and to identify sequences that are of significant phylogenetic interest using powerful analytical tools. Sequences that are identified as being similar to a number of supplied sequences may provide key insights into their functional or evolutionary relatedness. Users require the same basic computer skills as for navigating most internet applications.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

SJSU ScholarWorks

BackgroundAll-versus-all BLAST, which searches for homologous pairs of sequences in a database of proteins, is used to identify potential orthologs, to find new protein families, and to provide rapid access to these homology relationships. As DNA sequencing accelerates and data sets grow, all-versus-all BLAST has become computationally demanding.Methodology/principal findingsWe present FastBLAST, a heuristic replacement for all-versus-all BLAST that relies on alignments of proteins to known families, obtained from tools such as PSI-BLAST and HMMer. FastBLAST avoids most of the work of all-versus-all BLAST by taking advantage of these alignments and by clustering similar sequences. FastBLAST runs in two stages: the first stage identifies additional families and aligns them, and the second stage quickly identifies the homologs of a query sequence, based on the alignments of the families, before generating pairwise alignments. On 6.53 million proteins from the non-redundant Genbank database ("NR"), FastBLAST identifies new families 25 times faster than all-versus-all BLAST. Once the first stage is completed, FastBLAST identifies homologs for the average query in less than 5 seconds (8.6 times faster than BLAST) and gives nearly identical results. For hits above 70 bits, FastBLAST identifies 98% of the top 3,250 hits per query.Conclusions/significanceFastBLAST enables research groups that do not have supercomputers to analyze large protein sequence data sets. FastBLAST is open source software and is available at http://microbesonline.org/fastblast

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

A web-based bioinformatics interface applied to the GENOSOJA project: databases and pipelines

Author: Altschul SF
Audic S
Bateman A
Baudet C
Carazzolle MF
Cheng KCK
Dowell RD
Eliseu Binneck
Gonçalo Amarante Guimarães Pereira
Gustavo Gilson Lacerda Costa
Huang X
Jenkinson AM
Kanehisa M
Koski LB
Kulcheski FR
Leandro Costa do Nascimento
Li R
Marcelo Falsarella Carazzolle
Molina L
Rodrigues FA
Schmutz J
Smith TF
Soares-Cavalcanti NM
Suzek BE
Umezawa T
Wanderley-Nogueira AC
Wang L
Yorinori JT
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2012
Field of study

Crossref

Fast estimation of the difference between two PAM/JTT evolutionary distances in triplets of homologous sequences

Author: A Wagner
Adrian Schneider
B Chor
C Dessimoz
C Dessimoz
C Seoighe
Christophe Dessimoz
DL Swofford
DT Jones
ET Dermitzakis
G Blanc
Gaston H Gonnet
GC Conant
GH Gonnet
GH Gonnet
GH Gonnet
GM Cannarozzi
J Felsenstein
J Felsenstein
LB Koski
M Bulmer
M Hasegawa
M Kellis
Manuel Gil
MO Dayhoff
N Goldman
S Ohno
T Jukes
T Muller
TF DeLuca
Y Van de Peer
YJ Li
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: The estimation of the difference between two evolutionary distances within a triplet of homologs is a common operation that is used for example to determine which of two sequences is closer to a third one. The most accurate method is currently maximum likelihood over the entire triplet. However, this approach is relatively time consuming. RESULTS: We show that an alternative estimator, based on pairwise estimates and therefore much faster to compute, has almost the same statistical power as the maximum likelihood estimator. We also provide a numerical approximation for its variance, which could otherwise only be estimated through an expensive re-sampling approach such as bootstrapping. An extensive simulation demonstrates that the approximation delivers precise confidence intervals. To illustrate the possible applications of these results, we show how they improve the detection of asymmetric evolution, and the identification of the closest relative to a given sequence in a group of homologs. CONCLUSION: The results presented in this paper constitute a basis for large-scale protein cross-comparisons of pairwise evolutionary distances

Repository for Publications and Research Data

Crossref

Springer - Publisher Connector

PubMed Central

UCL Discovery

Determinants of weight gain in pregnant women attending a public prenatal care facility in Rio de Janeiro, Brazil: a prospective study, 2005-2007

Author: Abrams B
Ainsworth BE
Amorim AR
Andreto LM
Atalah E
Barker D
Bo S
Bray GA
Butte NF
Caulfield LE
Chasan-Taber L
Elisa Maria de Aquino Lacerda
Forsén T
Freedman DS
Friedwald WT
Gilberto Kac
Gordon CC
Gunderson EP
Han TS
Hellerstedt WL
Helm P
Hickey CA
Kac G
Kac G
Kac G
Kac G
Konno SC
Lacerda EMA
Lahti-Koski M
Maria Helena Constantino Spyrides
Melo ASO
Michael Maia Schlüssel
Mongoven M
Must A
Nucci LB
Nucci LB
Olson CM
Patricia Lima Rodrigues
Pinheiro JC
Saldana TM
Schieve LA
Seeds JW
Sichieri R
Stulbath TE
Takito MY
Thame M
Thorsdottir I
van Lenthe FJ
Ximenes FMA
Publication venue: 'FapUNIFESP (SciELO)'
Publication date: 01/01/2008
Field of study

Crossref

Quantitative sequence-function relationships in proteins based on gene ontology

Author: A Bairoch
A Bairoch
A Bateman
A Bateman
A Conesa
AE Todd
Arthur M Lesk
CA Wilson
CZ Cai
D Devos
D Devos
Daniel J Blankenberg
E Camon
EL Sonnhammer
J Piatigorsky
JA Gerlt
JA Ranea
JC Whisstock
K Fleming
L Holm
LB Koski
LJ Jensen
M Ashburner
M Shadidy
MA Andrade
MD Ganfornina
N Hulo
Naomi Altman
P Bork
R Karp
RA Laskowski
RA Laskowski
RC Edgar
S Jones
S Nakayama
SB Needleman
SE Brenner
SF Altschul
SR Eddy
SS Jeong
T Doerks
TF Smith
TK Attwood
Vineet Sangar
X Lu
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function – the basis of transfer of annotations in databases – must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins – for instance, orthologs in different mammals – to very distantly-related proteins at the limit of reliable recognition of homology. Results We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero. Conclusion Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.</p

Crossref

Directory of Open Access Journals

PubMed Central